Reproducing Suwannee River Hydrology workflow

Author

Jake Diamond

Published

July 25, 2024

1 Introduction

This report contains the workflow for calculating the hydrological analysis Suwannee River basin originally developed by Matthew Cohen. The data used in this report come from:

  1. the PRISM dataset, which is a high-resolution gridded dataset of temperature and precipitation. The data is available at a 4km resolution and is available for the entire United States. The monthly data is available from 1891 to the present, but the data used here are daily from 1981 to the present. The data is available for download from the PRISM website.
  2. the Lake City rainfall gage, which is the best long-record gage in the basin.
  3. Additional rainfall data from long-term gages at Mayo, Madison, High Springs, and Live Oak, which have variable long term data, with some missing data throughout. These come from the Florida Climate Center.
  4. the USGS streamflow data for 8 gages in the Suwannee River basin. The data is available from the USGS website.

The workflow is divided into the following sections:

  1. Load data
  2. Calculate cumulative anomalies
  3. Calculate the flow gain and loss for each subwatershed
  4. Analyze the flow loss, or reversal, patterns for each subwatershed

2 Load data

2.1 GIS data

The first step is to get the GIS layers. This is how we get the PRISM data linked to the subwatershed polygons. I downloaded the HUC8 shapefiles for the Suwannee River basin from the NHDPlus website. The shapefile contains the 8 subwatershed polygons for the Suwannee River basin. I then converted the shapefile to the same CRS as the PRISM data. The code to do this is below, but is commented out as it has already been run and takes some time.

Code
# All of this is commented out now to save on run time
# # Load the 4 most upstream shapefiles
# upper_suw <- st_read(here("data", "gis", "NHD_H_03110201_HU8_Shape", "Shape", "WBDHU8.shp")) %>%
#   st_transform(crs = 4269)
# alapaha <- st_read(here("data", "gis", "NHD_H_03110202_HU8_Shape", "Shape", "WBDHU8.shp")) %>%
#   st_transform(crs = 4269)
# withla <- st_read(here("data", "gis", "NHD_H_03110203_HU8_Shape", "Shape", "WBDHU8.shp")) %>%
#   st_transform(crs = 4269)
# little <- st_read(here("data", "gis", "NHD_H_03110204_HU8_Shape", "Shape", "WBDHU8.shp")) %>%
#   st_transform(crs = 4269)
# 
# # Combine the withla and little shapefiles to be withla
# withla <- st_union(withla, little) %>%
#   # sum all the areas of the huc10s (areasqkm), each time a union is made, 
#   #it adds a ".x" where x is the number of the union
#   mutate(areasqkm = sum(areasqkm, areasqkm.1)) %>%
#   # now drop all the other columns containing a "." in the name
#   dplyr::select(-contains("."))
# 
# # Load the santa fe and lower suwannee
# sf <- st_read(here("data", "gis", "NHD_H_03110206_HU8_Shape", "Shape", "WBDHU8.shp")) %>%
#   st_transform(crs = 4269)
# lower_suw <- st_read(here("data", "gis", "NHD_H_03110205_HU8_Shape", "Shape", "WBDHU8.shp")) %>%
#   st_transform(crs = 4269)
# 
# # Just the get the huc10s for the lower suwannee so can split into middle and lower
# lower_suw_huc10 <- st_read(here("data", "gis", "NHD_H_03110205_HU8_Shape", "Shape", "WBDHU10.shp")) %>%
#   st_transform(crs = 4269)
# 
# # Merge the first four huc10 polygons into a single polygon, call that the middle suwannee
# middle_suw <- st_union(lower_suw_huc10[1,], lower_suw_huc10[2,]) %>%
#   st_union(st_union(lower_suw_huc10[3,], lower_suw_huc10[4,])) %>%
#   # sum all the areas of the huc10s (areasqkm), each time a union is made, 
#   #it adds a ".x" where x is the number of the union
#   mutate(areasqkm = sum(areasqkm, areasqkm.1, areasqkm.2, areasqkm.1.1)) %>%
#   # now drop all the other columns containing a "." in the name
#   dplyr::select(-contains(".")) %>%
#   mutate(name = "Middle Suwannee")
# 
# # Merge the last two huc10 polygons into a single polygon, call that the lower suwannee
# lower_suw <- st_union(lower_suw_huc10[5, ], lower_suw_huc10[6, ]) %>%
#   st_union(st_union(lower_suw_huc10[7, ], lower_suw_huc10[8, ])) %>%
#   # sum all the areas of the huc10s (areasqkm), each time a union is made, 
#   #it adds a ".x" where x is the number of the union
#   mutate(areasqkm = sum(areasqkm, areasqkm.1, areasqkm.2, areasqkm.1.1)) %>%
#   # now drop all the other columns containing a "." in the name
#   dplyr::select(-contains(".")) %>%
#   mutate(name = "Lower Suwannee")
# 
# # get all the shapefiles together
# all <- bind_rows(upper_suw, alapaha, withla, sf, middle_suw, lower_suw)
# 
# # Save this data for use later
# saveRDS(all, file = here("data", "gis", "suwannee_subwatersheds.RDS"))

Now we can load the GIS layers and take a look to make sure they are correct (Figure 1).

Code
# Load the suwannee subwatersheds
all <- readRDS(here("data", "gis", "suwannee_subwatersheds.RDS"))

# discharge gage numbers
gages <- c("02319000", "02317500", "02315500", "02319500", 
           "02320500", "02321500", "02322500", "02323500")

# Get the locations of the gages relevant USGS gages
usgs_meta <- readNWISsite(gages)

# load the springs shapefile, this comes from the Suwannee River Water Management District site
springs <- st_read(here("data", "gis", "springs_SRWMD_24k", "sprsrwmd24.shp")) %>%
  st_transform(crs = 4269)

# only want the springs that are within the watershed
springs <- st_intersection(springs, all)

# Load the florida shapefile
fl <- st_as_sf(maps::map("state", regions = "florida", fill=TRUE, plot =FALSE))

# now get the flowlines for the suwannee, alapaha, withla, and sf
suwannee_flowline <- st_read(here("data", "gis", "NHD_H_03110201_HU8_Shape", "Shape", "NHDFlowline.shp")) %>%
  st_transform(crs = 4269) %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  filter(gnis_name == "Suwannee River")
alapaha_flowline <- st_read(here("data", "gis", "NHD_H_03110202_HU8_Shape", "Shape", "NHDFlowline.shp")) %>%
  st_transform(crs = 4269) %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  filter(gnis_name == "Alapaha River")
withla_flowline <- st_read(here("data", "gis", "NHD_H_03110203_HU8_Shape", "Shape", "NHDFlowline.shp")) %>%
  st_transform(crs = 4269) %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  filter(gnis_name == "Withlacoochee River")
little_flowline <- st_read(here("data", "gis", "NHD_H_03110204_HU8_Shape", "Shape", "NHDFlowline.shp")) %>%
  st_transform(crs = 4269) %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  filter(gnis_name == "Little River")
sf_flowline <- st_read(here("data", "gis", "NHD_H_03110206_HU8_Shape", "Shape", "NHDFlowline.shp")) %>%
  st_transform(crs = 4269) %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  filter(gnis_name == "Santa Fe River")
sw_flowline <- st_read(here("data", "gis", "NHD_H_03110205_HU8_Shape", "Shape", "NHDFlowline.shp")) %>%
  st_transform(crs = 4269) %>%
  st_zm(drop = TRUE, what = "ZM") %>%
  filter(gnis_name == "Suwannee River")

# merge all the flowlines to one geometry for easier plotting
flowlines <- bind_rows(suwannee_flowline, alapaha_flowline, 
                       withla_flowline, little_flowline, 
                       sf_flowline, sw_flowline)

# Plot of the watersheds, flowlines, gages, and springs with north arrow and scale bar
# and legend
p_main <- ggplot() +
  geom_sf(data = all) +
  geom_sf(data = flowlines, aes(color = "Major rivers"), size = 0.5) +
  geom_sf(data = springs, aes(fill = "Springs"), size = 1, shape = 21,
          color = "black") +
  geom_point(data = usgs_meta, aes(x = dec_long_va, y = dec_lat_va, fill = "USGS gages"), 
             color = "black", size = 2, shape = 21) +
  ggrepel::geom_text_repel(data = usgs_meta, aes(x = dec_long_va, y = dec_lat_va, label = site_no),
            color = "black", size = 3) +
  # add text for each subwatershed
  geom_sf_text(data = st_centroid(all), aes(label = name),
            color = "black", size = 3, nudge_x = c(0.05, 0, 0, 0, -0.5, -0.5),
            nudge_y = c(0.3, 0.4, 0.2 ,0.12, 0, 0)) +
  scale_color_manual(values = "black") +
  scale_fill_manual(values = c("lightblue", "white")) +
  annotation_north_arrow(location = "bl", which_north = "true", pad_x = unit(0.1, "in"), 
                         pad_y = unit(0.1, "in"), style = north_arrow_fancy_orienteering) +
  annotation_scale(location = "br", width_hint = 0.4, text_cex = 0.8) +
  theme(axis.title = element_blank(),
        panel.background = element_rect(fill = "white"),
        legend.title = element_blank(),
        legend.position = c(0.85, 0.12),
        legend.spacing.y = unit(-0.5, "cm"),
        legend.background = element_rect(fill = "transparent")) +
  labs(title = "Suwannee River subwatersheds",
       caption = "Data from USGS NWIS, NHDPlus, and SRWMD",
       x = "",
       y = "")

# add an inset of the florida map with the suwannee watershed
# dissolve the internal subwatershed boundaries
p_inset <- ggplot() +
  geom_sf(data = fl) +
  geom_sf(data = st_union(all)) +
  geom_sf(data = flowlines, color = "black", size = 0.3) +
  theme_minimal() +
  theme(axis.title = element_blank(),
        legend.title = element_blank(),
        axis.text = element_blank(),
        panel.grid = element_blank(),
        panel.background = element_rect(fill = "white"))
Figure 1: Map of study area with subwatersheds, gages, springs, and flowlines.

2.2 Rainfall data

Now we can link the PRISM data to each subwatershed to get the daily rainfall from 1981–2022. The first step is to download the PRISM data. The code to to do this is below, but is commented out as it has already been run and takes some time.

Code
# # Download daily PRISM data since 1981 ------------------------------------
# # Set the prism download folder
# prism_set_dl_dir(here("data", "prism"))

#--- download PRISM precipitation data ---#
# Only want to do this once because it's huge and takes forever
# commented out after downloaded 
# get_prism_dailys(
#   type = "ppt",
#   minDate = "1981-01-01",
#   maxDate = "2022-12-31",
#   keepZip = FALSE
# )

# # Get the PRISM data for the subwatersheds --------------------------------
# # Again only want to do this once because it takes a while to load, code 
# # here for reproducibility
# # Create a stack of rasters for all the prism data
# prism_rasters <- pd_stack(prism_archive_ls())
# 
# # Load the Suwannee River basin shapefile with its 8 subwatershed polygons
# # and convert this to the same CRS as the prism data
# suw_sf <-  readRDS(here("data", "gis", "suwannee_subwatersheds.RDS")) %>%
#   st_transform(crs = terra::crs(prism_rasters[[1]]))
# 
# # Get the prism data by subwatershed, each prism point also contains the coverage fraction
# ppt_stack <- exact_extract(prism_rasters, suw_sf, progress = F)
# 
# #  Combine the extracted data into a single data frame, id is the subwatershed number
# ppt <- bind_rows(ppt_stack, .id = "id")
# 
# # Save these so don't have to load again
# saveRDS(ppt, here("data", "gis", "suwannee_prism_daily_ppt.RDS"))
# saveRDS(prism_rasters, here("data", "gis", "prism_daily_raster_stack.RDS"))

Now we link the PRISM data with each watershed shapefile, weighting the data by the “coverage fraction” that is included in the PRISM data. This is done in the code below.

Code
# Load the previously calculated Suwannee River basin shapefile with its 8 subwatershed polygons
# and convert this to the same CRS as the prism data (NAD83)
suw_sf <-  readRDS(here("data", "gis", "suwannee_subwatersheds.RDS")) %>% #
  st_transform(crs = 4269)

# Get the PRISM data in usable format --------------------------------
# read in the prim data
ppt <- readRDS(here("data", "gis", "suwannee_prism_daily_ppt.RDS"))

# pivot to longer format, extract year and month and day from the column names
# column names are in the form: "PRISM_ppt_stable_4kmM3_YYYYMM_bil" (this was for monthly)
# column names are in the form: "PRISM_ppt_stable_4kmD1_YYYYMMDD_bil" (this is for daily)
ppt_df <- ppt %>%
  pivot_longer(cols = -c(id, coverage_fraction), names_to = "year_month_day", values_to = "ppt") %>%
  mutate(year = as.integer(str_sub(year_month_day, 24, 27)),
         month = as.integer(str_sub(year_month_day, 28, 29)),
         day = as.integer(str_sub(year_month_day, 30, 31)))

# Next, filter out rows with NA in month column because those are yearly summaries
# from before 1981, and then
# calculate the coverage-weighted mean of ppt for each subwatershed and month
ppt_df <- ppt_df %>%
  filter(!is.na(month)) %>%
  group_by(id, year, month, day) %>%
  summarise(ppt_aw = sum(ppt * coverage_fraction) / sum(coverage_fraction)) %>%
  mutate(id = as.numeric(id))

# Link that summary data back with the polygons
suw_final <- suw_sf %>%
  mutate(id := seq_len(nrow(.))) %>%
  left_join(., ppt_df, by = "id")

Let’s take a look at average annual rainfall for each subwatershed from 1981–2022 (Figure 2).

Code
# Caclulate annual rainfall each year by watershed
ppt_yr <- st_drop_geometry(suw_final) %>%
  group_by(name, areasqkm, id, year) %>%
  summarise(ppt_ann = sum(ppt_aw)) %>%
  ungroup() 

# Plot, arrange the bars by HUC
p_ppt_ann <- ggplot(ppt_yr, aes(x = fct_relevel(name, "Lower Suwannee", "Santa Fe", 
                                   "Middle Suwannee", "Withlacoochee", 
                                   "Alapaha", "Upper Suwannee"), 
                   y = ppt_ann)) +
  stat_summary(fun = mean, geom = "bar") +
  stat_summary(fun.data = mean_sdl, fun.args = list(mult = 1), geom = "errorbar", width = 0.2) +
  # y-axis minimum of 1000
  coord_cartesian(ylim = c(1000, NA)) +
  # scale_fill_viridis_c() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(x = "Subwatershed", y = "Average Annual Precipitation (mm)")
Figure 2: Mean annual rainfall (1981–2022) by subwatershed with standard deviation as error bars.

Now, for the years before 1981, we will use the Lake City rainfall gage data (observed data). Matt has already downloaded this and provided it. We will load the data, plot it (Figure 3 (a)), and calculate and plot the exceedance probability to get an idea of “big” and “small” storms (Figure 3 (b)).

Code
# Load the observed rainfall data and some analysis -----------------------
# load the measured Lake City rainfall data
lc_rain <- readxl::read_xlsx(here("data", "lake_city_rain.xlsx"))

# Plot the daily rainfall at Lake City
p_lc_rain <- ggplot(lc_rain, aes(x = as.Date(Date), y = `LCRain (mm)`)) +
  geom_line() +
  scale_x_date(date_labels = "%Y", date_breaks = "10 years") +
  theme_classic() +
  theme(axis.title.x = element_blank()) +
  labs(y = expression("observed rainfall at Lake City "*(mm~d^{-1})))

p_lc_rain

# Calculate the exceedance probability for the Lake City rainfall data
# need to first remove all zero rain days
lc_rain_exc <- rename(lc_rain, date = Date, ppt = `LCRain (mm)`) %>%
  filter(ppt > 0) %>%
  arrange(ppt) %>%
  mutate(rank = row_number(),
         exceedance = 1 - rank / n())

# create a dataframe for plotting line segments at the closest exceedances to 
# 10% and 20% with the corresponding rainfall values. need to use slice_min
# for each exceedance level
lc_rain_exc_lines <- lc_rain_exc %>%
  slice_min(abs(exceedance - 0.1)) %>%
  bind_rows(lc_rain_exc %>% slice_min(abs(exceedance - 0.2))) %>%
  bind_rows(lc_rain_exc %>% slice_min(abs(exceedance - 0.501))) %>%
  mutate(ppt = round(ppt, 1))

# plot the exceedance probability for the Lake City rainfall data on log scale, 
p_lc_exc <- ggplot(lc_rain_exc, aes(x = exceedance, y = ppt)) +
  geom_line() +
  scale_x_reverse(expand = expansion(mult = c(0, 0.01)),
                  breaks = seq(0,1,0.1)) +
  scale_y_log10(expand = expansion(mult = c(0, 0))) +
  # add the lines and labels from the dataframe
  geom_segment(data = lc_rain_exc_lines, aes(x = exceedance, xend = exceedance, 
                                             y = 0, yend = ppt), linetype = "dashed") +
  geom_segment(data = lc_rain_exc_lines, aes(x = 1, xend = exceedance, 
                                             y = ppt, yend = ppt), linetype = "dashed") +
  geom_text(data = lc_rain_exc_lines, aes(label = ppt, x = exceedance, y = ppt),
            nudge_y = 0.1) +
  annotation_logticks(sides = "l") +
  theme_classic() +
  labs(x = "exceedance probability", y = expression("rainfall "(mm~d^{-1})))

p_lc_exc
(a) Daily rainfall (1931–2022)
(b) Rainfall exceedance probability (1931–2022) at Lake City, with 50%, 20%, and 10% probabilities shown.
Figure 3: Lake City rainfall summary.

Let’s also take a look at the other long term rain gages, to check if they are similar to Lake City. We will plot the daily data for each gage (Figure 4).

Code
# load the measured rain data from Madison, Mayo, Live Oak, and White Springs,
# these are all files that were downloaded from the Florida Climate Center website
# they are located in "data" "fcc" folder as .csv files
# the first part of the file name before the underscore is the site, want to use that
rain_meas <- list.files(here("data", "fcc"), full.names = T) %>%
  map_dfr(~read_csv(.x) %>% 
            mutate(site = str_extract(basename(.x), "^[^_]+"))) %>%
  # add a leading 0 before the month and day columsn if only one digit
  mutate(MONTH = str_pad(MONTH, 2, pad = "0"),
         DAY = str_pad(DAY, 2, pad = "0")) %>%
  mutate(date = ymd(paste0(YEAR, MONTH, DAY)),
         ppt = if_else(precipitation < 0, NA_real_, precipitation * 25.4)) %>% #get into mm from inches
  select(site, date, ppt)

# Get all that data together
rain_meas <- rain_meas %>%
  bind_rows(lc_rain %>% 
              select(date = Date, ppt = `LCRain (mm)`) %>%
              mutate(site = "lakeCity"))

# Plot the daily rainfall at all gages
p_rain_meas <- ggplot(rain_meas, aes(x = as.Date(date), y = ppt)) +
  geom_line() +
  facet_wrap(~site) +
  scale_x_date(date_labels = "%Y", date_breaks = "20 years") +
  theme_classic() +
  theme(axis.title.x = element_blank()) +
  labs(y = expression("observed rainfall "*(mm~d^{-1})))
ggplotly(p_rain_meas)
Figure 4: Measured rainfall summary.

This data needs a little cleaning. There are some gaps at the beginning and in the middle of the datasets. I will remove the data at the beginning and end of each time series with large gaps, and I’ll fill the gaps in the middle with the best correlation among the gages (Figure 5).

Code
# Remove the gaps at the front end of liveOak and madison,
# and filter gaps at the end of liveOak and mayo
rain_meas <- rain_meas %>%
  filter(!(site == "liveOak" & date < ymd(19520822)),
         !(site == "liveOak" & date > ymd(20160224)),
         !(site == "mayo" & date > ymd(20201231)),
         !(site == "madison" & date < ymd(19030527)),
         !(site == "highSprings" & date > ymd(20150221))) %>%
  drop_na(date) %>%
  group_by(site) %>%
  nest() %>%
  # trim NAs at front and end of data
  mutate(data = map(data, zoo::na.trim)) %>%
  unnest(data) %>%
  ungroup()

# Plot all the gages against eachother
rain_meas %>%
  pivot_wider(names_from = site, values_from = ppt) %>%
  select(-date) %>%
  GGally::ggpairs()

# fill in NAs with best linear fit from other sites
rain_meas_fill <- rain_meas %>%
  mutate(best_cor = case_when(site == "highSprings" ~ "lakeCity",
                              site == "liveOak" ~ "mayo",
                              site == "madison" ~ "mayo",
                              site == "mayo" ~ "lakeCity")) %>%
  left_join(rename(rain_meas, ppt_cor = ppt), join_by(date, best_cor == site)) %>%
  mutate(ppt_fill = if_else(is.na(ppt) & ppt_cor == 0, 0, 
                       predict(lm(ppt ~ ppt_cor), .))) %>%
  mutate(ppt_clean = if_else(is.na(ppt), ppt_fill, ppt))
Figure 5: Correlations among rainfall gages

Now let’s check to see how well the PRISM data matches the Lake City rainfall data. We will plot the two datasets together, first comparing daily (Figure 6 (a)), then monthly (Figure 6 (b)). It is clear that the daily data is not a great match, but the monthly data is much better. Still, for the moment, we will assume that daily data before 1981 is the same as the Lake City data, but afterwards is subwatershed-specific from PRISM.

Code
st_drop_geometry(suw_final) %>%
  mutate(date = ymd(paste(year, month, day))) %>%
  select(id, name, year, date, ppt_aw) %>%
  left_join(lc_rain %>%
            rename(lc_ppt = `LCRain (mm)`,
            date = Date)) %>%
  ggplot(aes(x = ppt_aw, y = lc_ppt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  ggpubr::stat_regline_equation(label.y = 200, label.x = 30) +
  ggpubr::stat_cor(aes(label = after_stat(rr.label)), label.y = 175, label.x = 30) +
  facet_wrap(~name) +
  theme_classic() +
  labs(x = expression("subwatershed rainfall "(mm~d^{-1})), y = expression("Lake City rainfall "(mm~d^{-1})))

# Calculate the cumulative monthly rainfall for Lake City
lc_month <- lc_rain %>%
  mutate(date = as.Date(Date, format = "%m/%d/%Y"),
         year = year(date),
         month = month(date),
         day = day(date),
         ppt = as.numeric(`LCRain (mm)`)) %>%
  group_by(year, month) %>%
  summarise(ppt = sum(ppt)) %>%
  ungroup()

prism_month <- st_drop_geometry(suw_final) %>%
  group_by(name, year, month) %>%
  summarise(ppt = sum(ppt_aw)) %>%
  ungroup()

prism_month %>%
  select(name, year, month, ppt) %>%
  left_join(lc_month %>%
              select(year, month, lc_ppt = ppt)) %>%
  ggplot(aes(x = ppt, y = lc_ppt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  ggpubr::stat_regline_equation(label.y = 200, label.x = 450) +
  ggpubr::stat_cor(aes(label = after_stat(rr.label)), label.y = 100, label.x = 450) +
  facet_wrap(~name) +
  theme_classic() +
  labs(x = expression("subwatershed rainfall "(mm~mo^{-1})), y = expression("Lake City rainfall "(mm~mo^{-1})))

# Assume all rainfall before 1981 is the same as Lake City data, afterwards is 
# subwatershed PRISM data
# Get a crossing of all the rainfall data for each subwatershed from Lake City before 1980
rain <- distinct(st_drop_geometry(suw_final), name) %>%
  crossing(distinct(lc_rain, Date) %>%
             filter(Date < ymd(19810101))) %>%
  left_join(lc_rain, by = "Date") %>%
  mutate(year = year(as.Date(Date, format = "%m/%d/%Y")),
         ppt = `LCRain (mm)`) %>%
  rename(date = Date) %>%
  bind_rows(select(st_drop_geometry(suw_final),
                   name, year, month, day, ppt = ppt_aw) %>%
              mutate(date = ymd(paste(year, month, day)))) %>%
  arrange(name, date) %>%
  select(-month, -day, -`LCRain (mm)`, -Day)

# save this data
# saveRDS(rain, here("data", "suwannee_rainfall.RDS"))
(a) Daily comparison (1981–2022)
(b) Monthly comparison (1981–2022)
Figure 6: Lake City versus PRISM rainfall

We can also do the same thing, but looking at all rainfall gages (Figure 7), not just Lake City. It looks like Lake City is a reasonable proxy for all the subwatersheds compared to the other rainfall gages, although the Mayo gage performs better in many subwatersheds, particularly in the Lower and Middle Suwannee.

Code
st_drop_geometry(suw_final) %>%
  mutate(date = ymd(paste(year, month, day))) %>%
  select(id, name, year, date, ppt_aw) %>%
  left_join(rain_meas_fill) %>%
  ggplot(aes(x = ppt_aw, y = ppt)) +
  geom_hex(bins = 20, show.legend = FALSE) +
  facet_grid(rows = vars(site), cols = vars(name)) +
  geom_smooth(method = "lm", se = FALSE) +
  ggpubr::stat_cor(aes(label = after_stat(rr.label)), label.y = 230, label.x = 30, color = "red") +
  theme_classic() +
  labs(x = "subwatershed precipitation (mm)", y = "observed precipitation (mm)")
Figure 7: Observed versus PRISM rainfall

2.3 Discharge data

Now we will load the discharge data from the USGS gages in the Suwannee River basin. We will calculate the cumulative daily discharge for each gage over the period of record and plot the timeseries for each gage on log-y axes (Figure 8).

Code
# Get the discharge of the gages relevant USGS gages
# There are 8 gages with data going back to 1931
gages <- c("02319000", "02317500", "02315500", "02319500", 
           "02320500", "02321500", "02322500", "02323500")

# # Download the daily discharge data from USGS, only do this once to save time
# # commented out after downloaded
# usgs_data <- readNWISdv(siteNumbers = gages, parameterCd = "00060", 
#                         startDate = "1931-01-01", endDate = "2022-12-31")

# # Calculate the running cumulative discharge over the period of record for each gage
# flow <- usgs_data %>%
#   rename(Q = X_00060_00003,
#          date = Date) %>%
#   # convert to cubic meters per second
#   mutate(Q = Q * 0.0283168)

# saveRDS(flow, here("data", "suwannee_discharge.RDS"))
flow <- readRDS(here("data", "suwannee_discharge.RDS"))
  
# Plot the timeseries for each site on log-y axes
ggplot(flow, aes(x = date, y = Q)) +
  geom_line() +
  scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  annotation_logticks(sides = "l") +
  facet_wrap(~site_no, scales = "free_y") +
  theme_classic() +
  labs(x = "date",
       y = expression("discharge"~(m^3~s^{-1})))
Figure 8: Discharge data from USGS gages in the Suwannee River basin.

3 Calculate cumulative anomalies

With data in hand, now we can calculate the cumulative anomalies for each subwatershed. First, however, we will calculate the cumulative rainfall and discharge over time for each subwatershed. The deviations from the fitted line of these timeseries will be used to calculate the anomalies. Cumulative rainfall patterns vary quite little across watersheds (Figure 9 (a)), but the discharge patterns are quite different (Figure 9 (b)).

Code
# calculate the cumulative rainfall and discharge for each subwatershed
# first need to link each watershed USGS gage with its discharge data
df <- flow %>%
  mutate(name = case_when(
    site_no == "02319000" ~ "Withlacoochee",
    site_no == "02317500" ~ "Alapaha",
    site_no == "02315500" ~ "Upper Suwannee",
    site_no == "02319500" ~ "Upper Suwannee",
    site_no == "02320500" ~ "Middle Suwannee",
    site_no == "02321500" ~ "Santa Fe",
    site_no == "02322500" ~ "Santa Fe",
    site_no == "02323500" ~ "Lower Suwannee")) %>%
  select(site_no, name, date, Q) %>%
  filter(year(date) >= 1933) %>%
  left_join(select(rain, name, date, ppt), 
                   by = c("name", "date"))

# Calculate the cumulative rainfall and discharge or each site
df <- df %>%
  group_by(site_no) %>%
  arrange(date) %>%
  mutate(cum_ppt = cumsum(ppt),
         cum_Q = cumsum(Q),
         cum_D = row_number())

# Plot the cumulative rainfall and discharge by cumulative days for each site
# include best fit line with force fit through 0
# First just rainfall
p_cumP <- ggplot(df, aes(x = cum_D, y = cum_ppt)) +
  geom_point(linewidth = 0.5) +
  geom_smooth(method = "lm", se = F, formula = y ~ x + 0, linewidth = 0.3) +
  facet_wrap(~name) +
  ggpubr::stat_regline_equation(formula = y ~ x + 0,
                                label.y.npc = 0.8) +
  theme_classic() +
  labs(x = "cumulative days",
       y = "cumulative rainfall (mm)")
p_cumP

# Now do discharge, but do it in km^3/d (currently in m3/s),  and include on each panel the
# cumulative discharge (y-axis) against both cumulative rainfall ("blue") and cumulative days ("red")
p_cumQ <- ggplot(df, aes(y = cum_Q * 86400 / 1e9)) +
  geom_line(aes(x = cum_ppt, color = "cum_ppt"), linewidth = 0.5) +
  geom_line(aes(x = cum_D, color = "cum_D"), linewidth = 0.5) +
  geom_smooth(aes(x = cum_D), method = "lm", se = F, formula = y ~ x + 0, 
              linewidth = 0.3, linetype = "dashed", color = "black") +
  geom_smooth(aes(x = cum_ppt), method = "lm", se = F, formula = y ~ x + 0, 
              linewidth = 0.3, linetype = "dashed", color = "black") +
  facet_wrap(~site_no) +
  ggpubr::stat_regline_equation(aes(x = cum_D), formula = y ~ x + 0,
                                label.y.npc = 0.8) +
  ggpubr::stat_regline_equation(aes(x = cum_ppt), formula = y ~ x + 0,
                                label.y.npc = 0.7) +
  theme_classic() +
  guides(color = guide_legend(title = NULL)) +
  scale_color_manual(values = c("cum_ppt" = "blue", "cum_D" = "red"),
                     labels = c("cumulative rainfall", "cumulative days"))+
  labs(x = "cumulative days or cumulative rainfall (mm)",
       y = expression("cumulative discharge "(km^3))) +
  theme(legend.position = c(0.8, 0.2))
p_cumQ
(a) Cumulative rainfall by subwatershed (1933–2022)
(b) Cumulative discharge by subwatershed (1933–2022)
Figure 9: Lake City versus PRISM rainfall

Now we calculate the anomalies for each subwatershed. We will use the residuals from the best-fit line of the cumulative rainfall and discharge data to calculate the anomalies. We will then plot the anomalies for each subwatershed.

Rainfall anomalies are generally coherent across watersheds within the upper and lower parts of the watershed. Upper waterhsed subwatersheds exhibited a a peak of positive anomalies in the 1980s and 90s, whereas there has been an increasing trend in lower watershed subwatersheds since over time (Figure 10 (a)), with a sharp peak in recent years. Discharge anomalies are follow this split into upper and lower watershed behavior., The most downstream sites exhibited consistent positive anomalies from the 1960s to 90s, but are now exhibiting a sharp downturn in the past two decades (Figure 10 (b)). This “recent” decrease was common to most sites, except for the Alapaha and Withlacoochee gages. There was very little difference in the methods for discharge anomalies between cumulative rainfall or cumulative day approaches, suggesting the signals are robust.

Code
# Now we calculate the anomalies for each subwatershed. We will use the residuals 
# from the best-fit line of the cumulative rainfall and discharge data to calculate 
# the anomalies. We will then plot the anomalies for each subwatershed.
ppt_anom <- df %>%
  group_by(name) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(cum_ppt ~ cum_D + 0, data = .)),
         residuals = map2(data, lm, ~mutate(.x, residual = .x$cum_ppt - predict(.y, .x)))) %>%
  unnest(residuals) %>%
  ungroup() %>%
  select(-lm, -data)

# plot the residuals over time
p_res_ppt_time <- ggplot(ppt_anom, aes(x = date, y = residual)) +
  geom_line() +
  facet_wrap(~name, scales = "free_y") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  labs(x = "date",
       y = "rainfall anomaly (mm)")
p_res_ppt_time

# Now do discharge, but do it in km^3/d (currently in m3/s),  and include on each panel the anomalies from
# cumulative discharge (y-axis) against both cumulative rainfall ("blue") and cumulative days ("red")
Q_anom <- df %>%
  group_by(site_no) %>%
  nest() %>%
  mutate(lm_D = map(data, ~lm(cum_Q ~ cum_D + 0, data = .)),
         residuals_D = map2(data, lm_D, ~mutate(.x, res_D = .x$cum_Q - predict(.y, .x))),
         lm_ppt = map(data, ~lm(cum_Q ~ cum_ppt + 0, data = .)),
         residuals_ppt = map2(data, lm_ppt, ~transmute(.x, res_ppt = .x$cum_Q - predict(.y, .x))),) %>%
  unnest(c(residuals_D, residuals_ppt)) %>%
  ungroup() %>%
  select(-lm_D, -lm_ppt, -data)

# plot the anomalies over time
p_res_Q_time <- ggplot(Q_anom, aes(x = date)) +
  geom_line(aes(y = res_D * 86400 / 1E9, color = "cum_D")) +
  geom_line(aes(y = res_ppt * 86400 / 1E9, color = "cum_ppt")) +
  facet_wrap(~site_no, scales = "free_y") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  scale_color_manual(name = "", values = c("cum_ppt" = "blue", "cum_D" = "red"),
                     labels = c("cumulative rainfall", "cumulative days"))+
  labs(y = expression("flow anomaly "(km^3))) +
  theme(legend.position = c(0.8, 0.2))
p_res_Q_time
(a) Rainfall anomaly by subwatershed (1933–2022)
(b) Discharge anomaly by subwatershed (1933–2022)
Figure 10: Rainfall and discharge anomalies over time

Now let’s compare all on the same chart to aid in comparison.

Code
# do the same thing, but all on one plot, only lower, santa fe, middle, upper, and lake city
# lake city are observed and plotted as dashed lines, the other subwatersheds are solid and colored
p_res_P_time2 <- ppt_anom %>% 
  filter(name %in% c("Lower Suwannee", "Santa Fe", "Middle Suwannee", 
                     "Upper Suwannee")) %>%
  ggplot(aes(x = date, y = residual / 1000, color = name)) +
  scale_color_manual(values = c("Lower Suwannee" = "darkgrey", "Santa Fe" = "blue", 
                                "Middle Suwannee" = "black", "Upper Suwannee" = "red")) +
  geom_line() +
  geom_hline(yintercept = 0) +
  theme_classic() +
  theme(legend.title = element_blank(),
        legend.position = c(0.15, 0.85)) +
  labs(x = "date",
       y = "rainfall anomaly (m)")
p_res_P_time2

# Now the same for discharge, but want to get in units of meters for easier comparison,
# need to divide by watershed areas
p_res_Q_time2 <- Q_anom %>% 
  # filter(site_no %in% c("02323500", "02322500", "02320500", # this wasn't "upstream" anomalies
  #                       "02319500")) %>%
  # left_join(dataRetrieval::readNWISsite(c("02323500", "02322500", "02320500", 
  #                                       "02319500")) %>%
  filter(site_no %in% c("02322500", "02321500", "02315500", 
                        "02319500")) %>%
  left_join(dataRetrieval::readNWISsite(c("02322500", "02321500", "02315500", 
                        "02319500")) %>%
              select(site_no, area_mi2 = drain_area_va)) %>%
  mutate(areasqkm = area_mi2 * 2.58999^2) %>%
  mutate(res_D = res_D / areasqkm * 86400 / 1000^2,
         res_ppt = res_ppt / areasqkm * 86400 / 1000^2) %>%
  ggplot(aes(x = date, y = res_D, color = site_no)) +
  scale_color_manual(values = c("02322500" = "darkgrey", "02321500" = "blue",
                                "02319500" = "black", "02315500" = "red"),
                     labels = c("Upper Suwannee", "Middle Suwannee", 
                                "Santa Fe", "Lower Suwannee")) +
  geom_line() +
  geom_hline(yintercept = 0) +
  theme_classic() +
  theme(legend.title = element_blank(),
        axis.title.x = element_blank(),
        legend.position = c(0.15, 0.85)) +
  labs(y = "cumulative temporal upstream flow anomaly (m)")
p_res_Q_time2
(a) Rainfall anomaly by subwatershed (1933–2022)
(b) Discharge anomaly by subwatershed (1933–2022)
Figure 11: Rainfall and discharge anomalies over time

Finally let’s take a look at the other measured rain gage anomalies over time to see if they match the data from PRISM and if the trends from Lake City are matched. It’s hard to draw strong conclusions from the data, but it does not look like the Lake City data is representative of the trends at the other longterm rainfall gages (Figure 12). I don’t know what this means for our inferences regarding rainfall over time.

Code
ppt_cum_meas <- rain_meas_fill %>%
  select(site, date, ppt = ppt_clean) %>%
  group_by(site) %>%
  mutate(ppt = if_else(is.na(ppt), 0, ppt)) %>%
  mutate(cumP = cumsum(ppt),
         cumD = row_number()) %>%
  ungroup()

# Plot the running cumulative rainfall for each site and the best fit line
# with equation, with forced 0 intercept
p_ppt_cum_meas <- ggplot(ppt_cum_meas, aes(x = cumD, y = cumP)) +
  geom_line(linewidth = 0.5) +
  geom_smooth(method = "lm", se = FALSE, formula = y ~ x + 0, linewidth = 0.3) +
  ggpubr::stat_regline_equation(formula = y ~ x + 0,
                                label.y.npc = 0.8) +
  facet_wrap(~site) +
  theme_classic() +
  labs(x = "days", y = "cumulative precipitation (mm)")
# p_ppt_cum_meas

# Now calculate the cumulative anomalies for each subwatershed and Lake City
ppt_anom_meas <- ppt_cum_meas %>%
  group_by(site) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(cumP ~ cumD + 0, data = .)),
         residuals = map2(data, lm, ~mutate(.x, residual = .x$cumP - predict(.y, .x)))) %>%
  unnest(residuals) %>%
  ungroup() %>%
  select(-lm, -data)

# plot the residuals over time
p_resTime_meas <- ggplot(ppt_anom_meas, aes(x = date, y = residual)) +
  geom_line() +
  facet_wrap(~site) +
  geom_hline(yintercept = 0) +
  theme_classic() +
  labs(x = "date",
       y = "rainfall anomaly (mm)")
p_resTime_meas
Figure 12: Observed rain gage anomalies over time

4 Flow gain analysis

Now let’s look at how much flow is gained between consecutive gages. This informs whether there is gaining or losing stream conditions. The prevalence of losing stream conditions is what’s of interest here as that indicates the amount of surface water flowing back into the springs. For each site pair, let’s look at a chart of the 1) flow gain over the record, 2), the flow gain since 1998, 3) the flow gain as a function of upstream flow, and 4) the histogram of the flowgain.

4.1 Travel time between gages

First, we should look at the hydraulic travel time between gages to align the flow data. This is done by dividing the distance between gages by the flow velocity between gages. The distance between gages is calculated along the river shapefile using the “rivdist” package in R.

Code
# # This is commented out for the moment to avoid loading a bunch of shapefiles
# library(riverdist)
# 
# # turn the flowlines (loaded in the first steps) into a network object
# suw_net <- line2network(summarize(flowlines), reproject = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs") #needs projected crs, use WGS84
# 
# # Turn the USGS gage sites into a shapefile with the same projection
# usgs_pts <- st_as_sf(usgs_meta, coords = c("dec_long_va", "dec_lat_va"), crs = 4269) %>%
#   st_transform(crs = "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs")
# 
# # Turn those points into vertices on the network
# pts <- xy2segvert(x = st_coordinates(usgs_pts)[,1], y = st_coordinates(usgs_pts)[,2], rivers = suw_net)
# 
# # distances between points, a matrix of all possible distances between points
# dist_mat <-riverdistancemat(pts$seg, pts$vert, suw_net)
# 
# # only want distances between 2 and 4 ("ala_upper"), 3 and 4 ("withla_upper"),
# # 1 and 4 ("upper"), 4 and 5 ("middle"), 5 and 8 ("lower"), 6 and 7 ("sf") and 7 and 8 ("sf_lower")
# df_dists <- as_tibble(dist_mat) %>%
#   mutate(pt = row_number()) %>%
#   pivot_longer(cols = -pt) %>%
#   filter((pt == 2 & name == 4) |
#            (pt == 3 & name == 4) |
#            (pt == 1 & name == 4) |
#            (pt == 4 & name == 5) |
#            (pt == 5 & name == 8) |
#            (pt == 6 & name == 7) |
#            (pt == 7 & name == 8)) %>%
#   mutate(gage_pair = c("ala-suw", "withla-suw", "upper", "upper-mid", "mid-lower", "sf", "sf-lower")) %>%
#   mutate(site_no = c("02317500", "02319000", "02315500", "02319500", "02320500", "02321500", "02322500")) %>%
#   select(site_no, gage_pair, dist_m = value)
# 
# saveRDS(df_dists, here("data", "suwannee_gage_distances.RDS"))

Now, we calculate a discharge-dependent flow velocity from measured data from the USGS, then predict velocity every day for each gage pair (Figure 13).

Code
# Get the velocity data from the USGS
df_vel <- readNWISmeas(gages, expanded = TRUE) %>%
  select(site_no, date = measurement_dt, Q = discharge_va, v = chan_velocity) %>%
  mutate(Q = Q * 0.0283168, #ft3/s to m3/s
         v = v * 0.3048) %>% #ft/s to m/s
  filter(!is.na(v), v > 0, v < 2) # remove NA and unreasonable values

# plot velocity as a function of discharge for each gage
# not the best fits at 02319000 and 0231500, otherwise, good enough for purposes here
p_vel <- ggplot(df_vel, aes(x = Q, y = v, color =year(date))) +
  geom_point() +
  geom_smooth(method = "glm", se = FALSE) +
  scale_x_log10() +
  scale_y_log10() +
  facet_wrap(~site_no, scales = "free_y") +
  theme_classic() +
  labs(x = expression("discharge"~(m^3~s^{-1})),
       y = expression("velocity"~(m~s^{-1})))
p_vel

# get velocity as a function of discharge for each gage
df_vel_mod <- df_vel %>%
  group_by(site_no) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(log(v) ~ log(Q), data = .))) %>%
  select(-data)
Figure 13: Velocity versus discharge for each gage on log axes.

Now, we calculate a travel time for each day between gage pairs, and look at the distribution (Figure 14). Travel time is typically 2 days between gages, but this varies, and we can account for this variation.

Code
# first load distances between gages
df_dists <- readRDS(here("data", "suwannee_gage_distances.RDS"))

# now calculate for each day the velocity between gages and then the travel time
# between gages
df_tt <- df %>%
  select(site_no, date, Q) %>%
  group_by(site_no) %>%
  nest() %>%
  left_join(df_vel_mod, by = "site_no") %>%
  mutate(v = map2(data, lm, ~mutate(.x, v = exp(predict(.y, .x))))) %>%
  unnest(v) %>%
  select(-data, -lm) %>%
  left_join(df_dists) %>%
  mutate(tt = dist_m / v / 86400) %>% #travel times in days
    select(site_no, gage_pair, date, Q, v, tt)

# get the median travel time for each gage pair for plotting
df_tt_med <- df_tt %>%
  filter(tt > 0, tt < 5) %>%
  group_by(gage_pair) %>%
  summarise(median_tt = median(tt, na.rm = TRUE))

# look at histograms travel times between gages
ggplot(df_tt %>% filter(tt > 0, tt < 5),
       aes(x = tt)) +
  geom_histogram() +
  geom_vline(data = df_tt_med, aes(xintercept = median_tt)) +
  geom_text(data = df_tt_med, aes(x = 2.5, y = 5000, label = paste0("median = ", round(median_tt, 2)))) +
  facet_wrap(~gage_pair, scales = "free_x") +
  theme_classic() +
  labs(x = "travel time (days)",
       y = "count")
Figure 14: Travel time in days between gages.

4.2 Flow gain between gages

Now we will estimate flow gain as the difference between gages, with a dynamic lag based on the estimated travel time each day. We will then plot the flow gain for each gage pair over time (Figure 15). There is clear evidence of large flow gain in the upper watershed which rapidly diminishes in the lower watershed and can even reverse.

Code
# Get a dataframe of the gage pairs and tributary flows
gage_pairs <- tibble(#gage_pair = c("withla_suw", "ala_suw", "upper", "upper_mid", "mid_lower", "sf_lower", "sf"),
  gage_pair = c("upper", "middle", "lower", "sf"),
  usGage = c("02315500", "02319500", "02320500", "02321500"),
  dsGage = c("02319500", "02320500", "02323500", "02322500"),
  tribGage1 = c("02319000", NA_character_, "02322500", NA_character_),
  tribGage2 = c("02317500", NA_character_,  NA_character_, NA_character_),
  )

# now calculate the flow gains between gages, accounting for travel time between gages
# First make a function that dynamically lags a column by a given number of days
# the inputs are the column of data to dynamically lag, and the column of lags
lag_fun <- function(col, lag) {
  # first make sure that lag values are rounded to whole numbers
  lag <- round(lag)
  # now turn any lags greater than 5 days into 5, and any lags less than 0 into 0
  lag <- ifelse(lag > 5, 5, ifelse(lag < 0, 0, lag))
  # can't use dplyr lag because it doesn't allow for dynamic lags
  # will need to make a for loop
  out <- numeric(length(col))
  for (i in 1:length(col)) {
    if (i - lag[i] > 0) {
      out[i] <- col[i - lag[i]]
    } else {
      out[i] <- NA
    }
  }
  return(out)
}

# Now get the flow gains, accounting for lags
df_fg <- df_tt %>%
  select(site_no, date, Q, tt) %>%
  left_join(gage_pairs, join_by(site_no == usGage)) %>%
  drop_na(gage_pair) %>%
  group_by(gage_pair) %>%
  left_join(df %>% 
              select(date, dsGage = site_no, dsQ = Q)) %>%
  left_join(df_tt %>%
              select(date, tribGage1 = site_no, trib1Q = Q, trib1tt = tt)) %>%
  left_join(df_tt %>%
              select(date, tribGage2 = site_no, trib2Q = Q, trib2tt = tt)) %>%
  mutate(trib1Q    = if_else(is.na(trib1Q), 0, trib1Q),
         trib2Q    = if_else(is.na(trib2Q), 0, trib2Q),
         trib1tt   = if_else(is.na(trib1tt) | is.infinite(trib1tt), 0, trib1tt),
         trib2tt   = if_else(is.na(trib2tt) | is.infinite(trib2tt), 0, trib2tt),
         tt        = if_else(is.na(tt) | is.infinite(tt), 0, tt),
         trib1Qlag = lag_fun(trib1Q, trib1tt),
         trib2Qlag = lag_fun(trib2Q, trib2tt),
         usQlag    = lag_fun(Q, tt),
         flow_gain = dsQ - usQlag - trib1Qlag - trib2Qlag,
         flow_gain_nolag = dsQ - Q - trib1Q - trib2Q) %>%
  rename(usQ = Q) %>%
  ungroup() %>%
  filter(!(gage_pair == "lower" & date < "1941-10-01")) # remove data before gage 02322500 was installed

# Plot the time series of flow gain, and flows at the upstream and downstream gages
# with flow gain on the y-axis and the flows on a secondary y-axis, this needs 
# to be accomplished with a second plot and then combining them using patchwork package
p_fg <- ggplot(df_fg, aes(x = date, y = flow_gain)) +
  geom_line() +
  facet_wrap(~gage_pair) +
  scale_y_continuous(limits = c(-1500, 300)) +
  geom_hline(yintercept = 0) +
  theme_classic() +
  theme(panel.background = element_rect(fill = "transparent",colour = NA),
        plot.background = element_rect(fill = "transparent", colour = NA),
        axis.title.x = element_blank()) +
  labs(y = expression("flow gain"~(m^3~s^{-1})))
       
p_fg_Q <- ggplot(df_fg, aes(x = date)) +
  geom_line(aes(y = usQ, color = "upstream gage")) +
  geom_line(aes(y = dsQ, color = "downstream gage")) +
  scale_y_log10(position = "right", limits = c(1, 1E5),
                breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  facet_wrap(~gage_pair) +
  theme_classic() +
  scale_color_manual(values = c("upstream gage" = "blue", "downstream gage" = "red"), name = "") +
  theme(panel.background = element_rect(fill = "transparent",colour = NA),
        plot.background = element_rect(fill = "transparent", colour = NA),
        axis.text.y = element_text(color = c("blue", "red")),
        axis.title.y = element_text(color = c("blue", "red")),
        legend.title = element_blank(),
        legend.background = element_rect(fill = "transparent"),
        axis.title.x = element_blank(),
        legend.position = c(0.15, 0.57)) +
  labs(y = expression("flow"~(m^3~s^{-1})))

# Combine the two plots
p_fg_all <- p_fg + p_fg_Q + plot_layout(design = c(area(1, 1, 2, 2),
                                                   area(1, 1, 2, 2)))
p_fg_all
Figure 15: Time series of upstream and downstream flows (colors, secondary y-axis on log scale) and flow gain (y-axis) between gages.

We can look at the time series dynamically to confirm that peaks in the downstream gage do indeed lag the upstream gage and tributary flows by approximately 2 days. Let’s just use the Upper Suwannee as an example (Figure 16) – you can zoom and explore the data.

Code
df_fg %>%
  filter(gage_pair == "upper") %>%
  plot_ly(x = ~date) %>%
  add_lines(y = ~usQ, name = "upstream gage") %>%
  add_lines(y = ~dsQ, name = "downstream gage") %>%
  add_lines(y = ~trib1Q, name = "tributary 1") %>%
  add_lines(y = ~trib2Q, name = "tributary 2") %>%
  layout(title = "Flow at upstream, downstream, and tributary gages -- Upper Suwannee",
         xaxis = list(title = "date"),
         yaxis = list(title = "flow (m3/s)"))
Figure 16: Time series of upstream, tributary, and downstream flows for the Upper Suwanne section.

If we summarize this data, the patterns become more clear (Figure 17). Particularly in the middle Suwannee, above approximately 400 m3/s, the flow gain is consistently negative, indicating losing stream conditions. This is also true for the Santa Fe River, but the trend is slightly less strong The Upper and Lower Suwannee have more consistent positive flow gains, but still exhibit negative flow gain at their highest flows.

For gage pairs that have tributary flow (i.e., “Upper”, “Lower”), I have removed the tributary flow from the downstream gage, also accounting for dynamic travel time each day.

Code
ggplot(data = df_fg,
       aes(x = usQlag, y = flow_gain)) +
  scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
                labels = trans_format("log10", math_format(10^.x))) +
  geom_hline(yintercept = 0) +
  stat_summary_bin() +
  geom_smooth(method = "loess", se = FALSE) +
  facet_wrap(~gage_pair, scales = "free") +
  theme_classic() +
  scale_color_viridis_c() +
  labs(x = expression("discharge at upstream gage"~(m^3~s^{-1})),
       y = expression("flow gain"~(m^3~s^{-1})))
Figure 17: Summary of flow gain by discharge (log axis), with points representing the mean (+/-se) of flow gain at discharge bins, blue curve represents loess fit.

We can take a closer look at the “Upper-Mid” gage pair, as it appears the most interesting. It looks like the commonality of negative flow gain has change around the year 2000 (Figure 18), with less negative flow gain in recent years.

Code
# Plot the flow gains between gages as a function of the flow at the downstream gage,
# colored by year, with a loess smooth, faceted by gage pair
# First look at an example plot for the upper_mid section
p_mid_fg <- ggplot(data = filter(df_fg, gage_pair == "middle"),
       aes(x = usQ, y = flow_gain, color = year(date))) +
  scale_x_log10() + 
  geom_hline(yintercept = 0) +
  geom_point(alpha = 0.8, size = 0.7) +
  geom_smooth(method = "loess", se = FALSE) +
  theme_classic() +
  # want to color the points so that values less than 2000 are shades of blue
  # and values greater than 2000 are shades of red
  scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 2000) +
  coord_cartesian(ylim = c(-500, 100)) +
  theme(legend.position = c(0.2, 0.2)) +
  labs(x = expression("discharge at upstream gage"~(m^3~s^{-1})),
       y = expression("flow gain"~(m^3~s^{-1})),
       color = "year")
p_mid_fg

# now plot the histograms of flow gain for the upper_mid gage_pair colored by before or after 2000
p_fg_dens <- ggplot(data = filter(df_fg, gage_pair == "middle"),
       aes(x = flow_gain, after_stat(scaled), fill = year(date) > 2000)) +
  geom_density(alpha = 0.5) +
  coord_cartesian(xlim = c(-100, 100)) +
  geom_vline(xintercept = 0) +
  scale_fill_manual(labels = c("year < 2000", "year >= 2000"),
                    values = c("FALSE" = "blue", "TRUE" = "red")) +
  theme_classic() +
  theme(legend.position = c(0.8, 0.8),
        legend.title = element_blank()) +
  labs(x = expression("flow gain"~(m^3~s^{-1})),
       y = "scaled density")
p_fg_dens
(a) Flow gain as a function of upstream discharge, colored by year.
(b) Scaled density of flow gain, colored by year relative to 2000.
Figure 18: Flow gain in the Middle Suwanne River gage pair (02319500 and 02320500) over time.

5 Flow reversals

We define a flow reversal as any time that flow gain is negative. The first thing to check is the time series of reversals for each gage pair. We can plot the discharge at which a reversal happens as a function of time because there is reason to believe that the discharge necessary to induce a reversal is changing over time. There are no clear breakpoints in the records of reversals upon visual inspection, but perhaps there is an increase in the reversals at lower upstream discharges (Figure 19)?

Code
# now that we have flow gains for each gage pair, we can look at the reversal analysis
# if the flow gain is negative, the day is considered a flow reversal
# we will calculate first the number of flow reversals for each gage pair,
# then the cumulative flow volume (km3) of flow reverseals for each gage pair
# then we will do the same anomaly analysis as above, but for flow reversals
df_rev <- df_fg %>%
  mutate(reversal = flow_gain < 0) %>%
  group_by(gage_pair) %>%
  mutate(cum_rev_n = cumsum(replace_na(reversal, 0)),
         cum_rev_vol = cumsum(-replace_na(flow_gain, 0) * replace_na(reversal, 0)),
         cumD = row_number()) %>%
  ungroup()

df_rev %>%
  filter(reversal == TRUE) %>%
  ggplot(aes(x = date, y = usQlag, color = sum)) +
  geom_point(color = "grey", alpha = 0.5) +
  scale_y_log10() +
  facet_wrap(~gage_pair, scales = "free") +
  theme_classic() +
  labs(x = "date",
       y = expression("discharge at upstream gage leading to flow reversal"~(m^3~s^{-1})))
Figure 19: Upstream discharge that induces flow reversals over time.

The fact that there are reversals when upstream flow is so low at some sites is interesting. It could be due to the fact that rating curves are poorly defined at low flows…let’s take a quick look at the stage-discharge curves. There does appear to be considerable scatter in the curves at low flows (Figure 20). In the future, we can estimate the discharge uncertainty at each gage across flows, but for now, let’s assume a general 20% uncertainty in discharge at low-flows. We will use this uncertainty to remove low-flow (<20% flow quantile) reversals unless the 80% upstream flow is greater than 120% greater than downstream flow.

Code
df_sd <- readNWISmeas(gages, expanded = TRUE) 

df_sd %>%
  filter(gage_height_va < 90) %>%
  mutate(year = year(measurement_dt)) %>%
  ggplot(aes(x = gage_height_va, y = discharge_va, color = year)) +
  geom_point() +
  facet_wrap(~site_no, scales = "free") +
  scale_x_log10() +
  scale_y_log10()
Figure 20: Stage-discharge relationships for all USGS gages in the basin.

So how does that change the the flow reversals? We can use quantile regression to see the relationship between reversals and different upstream flows over time (Figure 21). It looks like reversals are happening at lower upstream flows across the board. In the upper and lower Suwannee, there is no increase in reversals at the higher end of upstream flows.

Code
df_rev <- df_fg %>%
  mutate(reversal = flow_gain < 0) %>%
  group_by(gage_pair) %>%
  mutate(totalUSQ = usQlag + trib1Qlag + trib2Qlag) %>% # total flow at upstream gages
  mutate(reversal_lowflow = if_else((usQlag < quantile(usQ, 0.2) | # if any of the upstream flows are < 20% quantile
                                       trib1Qlag < quantile(trib1Q, 0.2) |
                                       trib2Qlag < quantile(trib2Q, 0.2)) & 
                                       totalUSQ * 0.8 < dsQ * 1.2, 0, 1)) %>%
  mutate(cum_rev_n = cumsum(replace_na(reversal, 0)),
         cum_rev_vol = cumsum(-replace_na(flow_gain, 0) * replace_na(reversal, 0)),
         cumD = row_number()) %>%
  ungroup()

df_rev %>%
  filter(reversal == TRUE  & reversal_lowflow == 1) %>%
  ggplot(aes(x = date, y = totalUSQ)) +
  geom_point(color = "grey", alpha = 0.5) +
  scale_y_log10() +
  geom_quantile(quantiles = c(0.1, 0.5, 0.9), color = "blue") +
  facet_wrap(~gage_pair, scales = "free") +
  theme_classic() +
  labs(x = "date",
       y = expression("upstream discharge leading to flow reversal"~(m^3~s^{-1})))
Figure 21: Quantile regressions (10, 50, and 90%) of flow reversals on upstream flows.

We can also just look at annual reversal volumes to see if there is a trend over time and to just get an idea of the magnitude of reversal volume and its relation to annual discharge (Figure 22). Reversals are on the order of a few percent of total disharge at the downstream reach.

Code
# We can also caluclate the annual flow reversal volume for each gage pair (in MGD)
df_rev %>%
  group_by(gage_pair, year = year(date)) %>%
  summarise(total_rev_vol = sum(-replace_na(flow_gain, 0) * replace_na(reversal, 0))*86400/1000^3) %>%
  ggplot(aes(x = year, y = total_rev_vol)) +
  geom_line() +
  facet_wrap(~gage_pair, scales = "free_y") +
  theme_classic() +
  labs(x = "year",
       y = expression("annual flow reversal volume"~(km^3)))

# Or annual reversal volume as a fraction of annual discharge
df_rev %>%
  group_by(gage_pair, year = year(date)) %>%
  summarise(total_rev_vol = sum(-replace_na(flow_gain, 0) * replace_na(reversal, 0))*86400/1000^3,
            total_discharge = sum(dsQ)*86400/1000^3) %>%
  ggplot(aes(x = year, y = total_rev_vol/total_discharge)) +
  geom_line() +
  facet_wrap(~gage_pair, scales = "free_y") +
  theme_classic() +
  labs(x = "year",
       y = expression("annual flow reversal volume as fraction of annual discharge"))
(a) Annual reversal volume over time for each springshed
(b) Annual reversal volume as a fraction of downstream discharge over time for each springshed
Figure 22: Annual of flow reversals over time.

Given that we have observed 1998 as a changepoint, we can look at the the distribution of flows at the upstream gage (on the main stem) that induce flow reversals before and after 1998. It looks like the distribution of flows that induce reversals has shifted to lower flows after 1998 (Figure 23). At each river section, except the middle Suwannee, the median flow necessary to induce reversal has decreased by approximately 40%, whereas in the middle Suwannee, it has only decreased approximately 20%.

Code
# for each gage pair, look at the distribution of upstream flows
# that induce a flow reversal
df_rev %>%
  group_by(gage_pair) %>%
  mutate(shift = if_else(year(date) > 1998, "after", "before"),
         # get the quantile of flow each day
         flow_per = rank(usQ)/sum(!is.na(usQ))) %>%
  ungroup() %>%
  filter(reversal == TRUE) %>%
  ggplot(aes(x = usQlag, fill = shift, y = after_stat(scaled))) +
  geom_density(alpha = 0.5) +
  facet_wrap(~gage_pair, ncol = 1, scales = "free") +
  # add a vertical line at the median of the distribution
  geom_vline(data = . %>% group_by(gage_pair, shift) %>% summarise(median_usQ = median(usQlag, na.rm = TRUE)),
             aes(xintercept = median_usQ, color = shift)) +
  # add text with the median value
  geom_text(data = . %>% group_by(gage_pair, shift) %>% summarise(median_usQ = median(usQlag, na.rm = TRUE)),
            aes(x = median_usQ*5, y = 1.2, label = round(median_usQ),
                color = shift)) +
  theme_classic() +
  scale_fill_discrete(labels = c("after 1998", "before 1998"),
                    name = "") +
  guides(color = "none") +
  labs(x = expression("discharge at upstream gage leading to flow reversal"~(m^3~s^{-1})),
       y = "density")
Figure 23: Distribution of upstream flows leading to flow reversals before and after 1998. Medians are shown as vertical lines with colored numbers showing median values

Another way to visualize this data is to look at the mean probability that a reversal occurs at a given discharge. We will bin the discharge into 20 log-bins and calculate the mean probability of a reversal in each bin. We can then look at how this probability has changed before and after the 1998 breakpoint (?@fig-rev_prob). We observe that there is greater reversal probability post-1998 across nearly the entire range of flows for the Lower Suwannee. This is similar for the Middle and Upper Suwannee, but the Middle Suwannne exhibits a lower likelihood for reversals at the highest discharges (on the order of 10–20% less likely) post-1998, contrasting with the other springsheds.

Code
df_revprob <- df_rev %>%
  group_by(gage_pair) %>%
  drop_na(reversal) %>%
  filter(usQlag != 0) %>%
  mutate(logQ = log10(usQlag),
         bin = Hmisc::cut2(logQ, g = 20, levels.mean = T),# create bins of log Q at each 0.05 quantile
         Qbin = 10^as.numeric(as.character(bin))) %>% 
  mutate(ba = if_else(year(date) > 1998, "after", "before")) %>%
  # calculate the mean probability of reversal within each bin
  group_by(gage_pair, Qbin, ba) %>%
  summarise(reversal = mean(reversal, na.rm = TRUE)) %>%
  ungroup()

# Calculate the difference in probability at each bin between pre and post 1998 for each gage pair
df_revprob_diff <- df_revprob %>%
  pivot_wider(names_from = ba, values_from = reversal) %>%
  mutate(diff = after - before,
         dir = if_else(diff > 0, "increase", "decrease"))

# calculate the total increase in reversal probability for each gage pair
df_revprob_diff %>%
  group_by(gage_pair) %>%
  summarise(total_diff = sum(diff)*100)
gage_pair total_diff
lower 189.270926
middle 1.381781
sf 12.120274
upper 50.067784
Figure 24: Mean probability of flow reversal at a given upstream for each gage pair for before and after 1998.
Code
# now plot
p_revprob <- ggplot(data = df_revprob,
                    aes(x = Qbin, y = reversal, color = ba)) +
  scale_x_log10() +
  annotation_logticks(sides = "b") +
  geom_point() +
  geom_line() +
  # shade in the area between the two lines
  # geom_ribbon(data = df_revprob_diff,
  #             aes(ymin = before, ymax = after, x = Qbin, fill = dir),
  #             alpha = 0.5) +
  scale_color_manual(name = "Time period",
                     values = c("before" = "blue", "after" = "red"),
                     labels = c("after 1998", "before 1998")) +
  # scale_fill_manual(name = "Direction of change",
  #                   values = c("increase" = "red", "decrease" = "blue"),
  #                   labels = c("increase", "decrease")) +
  # guides(fill = "none") +
  facet_wrap(~gage_pair, scales = "free") +
  theme_classic() +
  labs(x = expression("discharge at upstream gage"~(m^3~s^{-1})),
       y = expression("reversal probability"))

p_revprob
Figure 25: Mean probability of flow reversal at a given upstream for each gage pair for before and after 1998.

However, these observations are convolved with across-the-board decreases in stream flow over time. We can see that there is considerably less flow at all gages post-1998, but this is most evident at low-to-medium flows (Figure 26). Thus, disentangling the effect of decreasing flows from the effect of increasing reversals is difficult.

Code
# Exceedance curves of gages before and after 1998
# calculate the exceedance curves for each gage
df_exc <- df %>%
  mutate(ba = if_else(year(date) > 1998, "after", "before")) %>%
  group_by(site_no, ba) %>%
  arrange(site_no, ba, Q) %>%
  mutate(rank = row_number(),
         exceedance = 1 - rank / n()) %>%
  ungroup()
  
# plot the exceedance probability for discharges on log scale, 
p_Qexc <- ggplot(df_exc, aes(x = exceedance, y = Q, color = ba)) +
  geom_line() +
  scale_x_reverse(expand = expansion(mult = c(0, 0.01)),
                  breaks = seq(0,1,0.1)) +
  scale_y_log10(expand = expansion(mult = c(0, 0))) +
  annotation_logticks(sides = "l") +
  facet_wrap(~site_no, scales = "free_y") +
  scale_color_manual(name = "Time period",
                     values = c("before" = "blue", "after" = "red"),
                     labels = c("after 1998", "before 1998")) +
  theme_classic() +
  labs(x = "exceedance probability", y = expression("Q "(m^3~s^{-1})))
p_Qexc
Figure 26: Exceedance curves of gages before and after 1998.

We can also look at how the number of flow reversals per high flow event has changed using an anomaly analysis. We find that the number of flow reversals per high flow event has increased in the Lower and Upper Suwannee, but decreased in the Middle Suwannee and Santa Fe River (Figure 27).

Code
# For each gage pair, look at how many reversals occur per occurrence of a flow that exceeds the 90% percentile
# of the upstream flow -- how this changes over time
df_rev <- df_rev %>%
  group_by(gage_pair) %>%
  drop_na(usQ, dsQ, trib1Q, trib2Q) %>%
  mutate(usQ90 = quantile(usQ, probs = 0.9, na.rm = TRUE),
         reversal_usQ90 = if_else(usQ > usQ90, reversal, NA)) %>%
  mutate(cum_rev_usQ90 = cumsum(replace_na(reversal_usQ90, 0)),
         # volume of reversal for storms above 90% quantile in million m3/d
         rev_vol_usQ90 = -replace_na(flow_gain, 0) * replace_na(reversal_usQ90, 0)*86400/1E6, 
         ba = if_else(year(date) > 1998, "after", "before")) %>%
  ungroup()

# # Plot that data
# p_rev_usQ90 <- ggplot(df_rev, aes(x = cumD, y = cum_rev_usQ90)) +
#   geom_line() +
#   theme_classic() +
#   facet_wrap(~gage_pair, scales = "free_y") +
#   stat_smooth(method = "lm", se = FALSE, formula = y ~ x + 0) +
#   ggpubr::stat_regline_equation(formula = y ~ x + 0,
#                                 label.y.npc = 0.8) +
#   labs(x = "date",
#        y = "cumulative flow reversals when upstream flow > 90% percentile")
# p_rev_usQ90

# calculate this anomaly analysis
df_revQ90_anom <- df_rev %>%
  group_by(gage_pair) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(cum_rev_usQ90 ~ cumD + 0, data = .)),
         residuals = map2(data, lm, ~mutate(.x, residual = .x$cum_rev_usQ90 - predict(.y, .x)))) %>%
  unnest(residuals) %>%
  ungroup() %>%
  select(-lm, -data)

# plot the residuals over time
p_resTime_revQ90 <- ggplot(df_revQ90_anom, aes(x = date, y = residual, color = gage_pair)) +
  geom_line() +
  scale_color_manual(name = "Springshed",
                     values = c("lower" = "grey", "middle" = "black", "sf" = "blue", "upper" = "red"),
                     labels = c("Lower Suwannee", "Middle Suwannee", "Santa Fe River", "Upper Suwannee")) +
  theme_classic() +
  geom_hline(yintercept = 0) +
  theme(axis.title.x = element_blank(),
        legend.position = c(0.15, 0.8)) +
  labs(y = "flow reversal per Q90 anomaly")
p_resTime_revQ90
Figure 27: Cumulative reversal count anomalies per discharge above the 90% percentile

We now do the same sort of general anomaly analysis as we did for rainfall and discharge, but for flow reversals (Figure 28 (a)). We find that flow reversals are becoming more common in the Lower and Upper Suwannee springsheds since 2000, but that the Middle Suwannee and Santa Fe River springsheds have been experience far less flow reversal since 2000, with the change being particularly drastic for the Middle Suwannee (Figure 28 (b)). The timing of change in volumetric anomaly is almost identical among the springsheds, but notably the magnitudes are similar as well. The volume of flow reversals during storms above the 90% quantile of upstream flow has increased in the Lower and Upper Suwannee, but decreased in the Middle Suwannee and Santa Fe River since 2000 (Figure 28 (b)).

Code
# Plot the cumulative loss volume over time
p_cumRev <- ggplot(df_rev, aes(x = cumD, y = cum_rev_vol*86400/1000^3)) +
  geom_line() +
  theme_classic() +
  facet_wrap(~gage_pair, scales = "free_y") +
  stat_smooth(method = "lm", se = FALSE, formula = y ~ x + 0) +
  ggpubr::stat_regline_equation(formula = y ~ x + 0,
                                label.y.npc = 0.8) +
  labs(x = "cumulative days",
       y = expression("cumulative flow reversal volume"~(km^3)))
p_cumRev

# now we calculate the anomalies, or residuals, from the best fit line for the cumulative flow reversals
df_rev_anom <- df_rev %>%
  group_by(gage_pair) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(cum_rev_vol ~ cumD + 0, data = .)),
         residuals = map2(data, lm, ~mutate(.x, residual = .x$cum_rev_vol - predict(.y, .x)))) %>%
  unnest(residuals) %>%
  ungroup() %>%
  select(-lm, -data)

# plot the residuals over time
p_resTime_rev <- ggplot(df_rev_anom, aes(x = date, y = residual*86400/1000^3, color = gage_pair)) +
  geom_line() +
  scale_color_manual(name = "Springshed",
                     values = c("lower" = "grey", "middle" = "black", "sf" = "blue", "upper" = "red"),
                     labels = c("Lower Suwannee", "Middle Suwannee", "Santa Fe River", "Upper Suwannee")) +
  theme_classic() +
  geom_hline(yintercept = 0) +
  theme(axis.title.x = element_blank(),
        legend.position = c(0.15, 0.8)) +
  labs(y = expression("flow reversal anomaly"~(km^3)))
p_resTime_rev

# Plot the distribution of flow reversals when upstream flow is greater than 90% percentile before and after 1998
df_rev %>%
  filter(reversal_usQ90 == 1) %>%
  ggplot(aes(x = rev_vol_usQ90, fill = ba, y = after_stat(scaled))) +
  geom_density(alpha = 0.5) +
  facet_wrap(~gage_pair, ncol = 1, scales = "free") +
  # add a vertical line at the median of the distribution
  geom_vline(data = . %>% group_by(gage_pair, ba) %>% summarise(median_vol = median(rev_vol_usQ90, na.rm = TRUE)),
             aes(xintercept = median_vol, color = ba)) +
  # add text with the median value
  geom_text(data = . %>% group_by(gage_pair, ba) %>% summarise(median_vol = median(rev_vol_usQ90, na.rm = TRUE)),
            aes(x = median_vol*8, y = 0.8, label = round(median_vol, 1),
                color = ba)) +
  theme_classic() +
  scale_fill_discrete(labels = c("after 1998", "before 1998"),
                    name = "") +
  guides(color = "none") +
  labs(x = expression("reversal volume during storms"~("million"~m^3)),
       y = "density")
(a) Cumulative volumetric flow reversal by cumulative time by springshed (1933–2022)
(b) Volumetric flow reversal anomalies over time by springshed (1933–2022)
(c) Volume of flow reversal during storms above the 90% quantile of upstream flow by springshed before and after 1998.
Figure 28: Flow reversal anomalies over time.

6 Anomalies of variably sized storms and flows

If we look at how the frequency of largest storms have changed over time (using the exact methods as before), we see that there has been a broad decrease in the number of large storms across the board for the PRISM data, but this is completely contrasted by the Lake City data.

Code
# First, we need to calculate the cumulative anomalies of different sized storm events
# the storm events will be defined by the 80th, 90th, and 95th percentiles of the
# daily rainfall data for each subwatershed. 
df_storms <- df %>%
    bind_rows(mutate(lc_rain, site_no = "LakeCity", name = "LakeCity") %>%
              select(site_no, name, ppt = `LCRain (mm)`, date = Date, cum_D = Day) %>%
              mutate(cum_ppt = cumsum(ppt))) %>%
  group_by(name) %>%
  nest() %>%
  mutate(pct80 = map(data, ~quantile(.x$ppt[.x$ppt > 0], probs = 0.8, na.rm = T)),
         pct90 = map(data, ~quantile(.x$ppt[.x$ppt > 0], probs = 0.9, na.rm = T)),
         pct95 = map(data, ~quantile(.x$ppt[.x$ppt > 0], probs = 0.95, na.rm = T)),
         storms = map(data, ~mutate(.x, storm80 = ifelse(ppt >= pct80, 1, 0),
                                    storm90 = ifelse(ppt >= pct90,1, 0),
                                    storm95 = ifelse(ppt >= pct95, 1, 0)))
  ) %>%
  unnest(storms) %>%
  ungroup() %>%
  select(-data, -pct80, -pct90, -pct95) %>%
  # calculate cumulative count of each storm event
  group_by(name) %>%
  arrange(name, date) %>%
  mutate(cum_storm80 = cumsum(storm80),
         cum_storm90 = cumsum(storm90),
         cum_storm95 = cumsum(storm95))

# Now do the cumulative anomaly analysis of the counts of storm events
# for each subwatershed
p_cum_storms <- ggplot(df_storms, aes(x = cum_D)) +
  geom_line(aes(y = cum_storm80, color = "80th percentile")) +
  geom_line(aes(y = cum_storm90, color = "90th percentile")) +
  geom_line(aes(y = cum_storm95, color = "95th percentile")) +
  facet_wrap(~name, scales = "free_y") +
  theme_classic() +
  scale_color_manual(name = "", values = c("80th percentile" = "blue", 
                                           "90th percentile" = "red", 
                                           "95th percentile" = "black")) +
  labs(y = "cumulative storm event count")
# p_cum_storms

# calculate the anomalies of the storm event counts
df_storms_anom <- df_storms %>%
  group_by(name) %>%
  nest() %>%
  mutate(lm80 = map(data, ~lm(cum_storm80 ~ cum_D + 0, data = .)),
         residuals80 = map2(data, lm80, ~mutate(.x, res_storm80 = .x$cum_storm80 - predict(.y, .x))),
         lm90 = map(data, ~lm(cum_storm90 ~ cum_D + 0, data = .)),
         residuals90 = map2(data, lm90, ~transmute(.x, res_storm90 = .x$cum_storm90 - predict(.y, .x))),
         lm95 = map(data, ~lm(cum_storm95 ~ cum_D + 0, data = .)),
         residuals95 = map2(data, lm95, ~transmute(.x, res_storm95 = .x$cum_storm95 - predict(.y, .x))) ) %>%
  unnest(c(residuals80, residuals90, residuals95)) %>%
  ungroup() %>%
  select(-lm80, -lm90, -lm95, -data)

# plot the anomalies over time
p_res_storm_time <- ggplot(df_storms_anom, aes(x = date)) +
  geom_line(aes(y = res_storm80, color = "80th percentile")) +
  geom_line(aes(y = res_storm90, color = "90th percentile")) +
  geom_line(aes(y = res_storm95, color = "95th percentile")) +
  facet_wrap(~name, scales = "free_y") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  scale_color_manual(name = "", values = c("80th percentile" = "blue", 
                                           "90th percentile" = "red", 
                                           "95th percentile" = "black")) +
  labs(y = "storm event count anomaly")
p_res_storm_time
Figure 29: Cumulative storm anomalies for each subwatershed and the Lake City gage over time for storms greater than the 80th, 90th, and 95th percentile of daily rainfall.

We can do the same thing for the other measured rainfall gages. It does not appear that measured data aligns well with the PRISM data. There are only small anomalies over the period of record, and the patterns are not strong (Figure 30), and are well-reflected in the overall time series of rainfall anomalies (Figure 12).

Code
# Now do the cumulative anomaly analysis of the counts of storm events for each gage
df_storms_meas <- rain_meas_fill %>%
  select(site, date, ppt = ppt_clean) %>%
  filter(date >= ymd("1933-01-01")) %>%
  mutate(ppt = if_else(is.na(ppt), 0, ppt)) %>%
  group_by(site) %>%
  arrange(site, date) %>%
  mutate(cum_D = row_number()) %>%
  nest() %>%
  mutate(pct80 = map(data, ~quantile(.x$ppt[.x$ppt > 0], probs = 0.8, na.rm = T)),
         pct90 = map(data, ~quantile(.x$ppt[.x$ppt > 0], probs = 0.9, na.rm = T)),
         pct95 = map(data, ~quantile(.x$ppt[.x$ppt > 0], probs = 0.95, na.rm = T)),
         storms = map(data, ~mutate(.x, storm80 = ifelse(ppt >= pct80, 1, 0),
                                    storm90 = ifelse(ppt >= pct90,1, 0),
                                    storm95 = ifelse(ppt >= pct95, 1, 0))
         ) ) %>%
  unnest(storms) %>%
  ungroup() %>%
  select(-data, -pct80, -pct90, -pct95) %>%
  # calculate cumulative count of each storm event
  group_by(site) %>%
  arrange(site, date) %>%
  mutate(cum_storm80 = cumsum(storm80),
         cum_storm90 = cumsum(storm90),
         cum_storm95 = cumsum(storm95))

# calculate the anomalies of the storm event counts
df_storms_meas_anom <- df_storms_meas %>%
  group_by(site) %>%
  nest() %>%
  mutate(lm80 = map(data, ~lm(cum_storm80 ~ cum_D + 0, data = .)),
         residuals80 = map2(data, lm80, ~mutate(.x, res_storm80 = .x$cum_storm80 - predict(.y, .x))),
         lm90 = map(data, ~lm(cum_storm90 ~ cum_D + 0, data = .)),
         residuals90 = map2(data, lm90, ~transmute(.x, res_storm90 = .x$cum_storm90 - predict(.y, .x))),
         lm95 = map(data, ~lm(cum_storm95 ~ cum_D + 0, data = .)),
         residuals95 = map2(data, lm95, ~transmute(.x, res_storm95 = .x$cum_storm95 - predict(.y, .x))) ) %>%
  unnest(c(residuals80, residuals90, residuals95)) %>%
  ungroup() %>%
  select(-lm80, -lm90, -lm95, -data)

# plot the anomalies over time
p_res_storm_time_meas <- ggplot(df_storms_meas_anom, aes(x = date)) +
  geom_line(aes(y = res_storm80, color = "80th percentile")) +
  geom_line(aes(y = res_storm90, color = "90th percentile")) +
  geom_line(aes(y = res_storm95, color = "95th percentile")) +
  facet_wrap(~site, scales = "free_y") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  scale_color_manual(name = "", values = c("80th percentile" = "blue", 
                                           "90th percentile" = "red", 
                                           "95th percentile" = "black")) +
  labs(y = "storm event count anomaly")
p_res_storm_time_meas
Figure 30: Cumulative storm anomalies for measured rainfall gages 80th, 90th, and 95th percentile of daily rainfall.

Likewise we can do the same for discharge, looking at both the changing prevalence of low discharges (<20%) and high discharges (>80%) over time. Let’s first look at just the changes counts of varying discharges over time (Figure 31). We see a coherence across gages of an drastic recent increase in low discharges (less than median flow) and a concomitant decrease in high discharge events, although this is less pronounce and only strongly evidence for the Santa Fe (1500 and 2500 site), the lower Suwannee (3500 site), and the upper suwanne (5500 site).

Code
# do the exact same thing, but for discharge percentiles (20, 50, 80, 90)
df_Q_flows <- df %>%
  group_by(site_no) %>%
  nest() %>%
  mutate(pct20 = map(data, ~quantile(.x$Q, probs = 0.2, na.rm = T)),
         pct50 = map(data, ~quantile(.x$Q, probs = 0.5, na.rm = T)),
         pct80 = map(data, ~quantile(.x$Q, probs = 0.8, na.rm = T)),
         pct90 = map(data, ~quantile(.x$Q, probs = 0.9, na.rm = T)),
         flows = map(data, ~mutate(.x, flow20 = ifelse(Q <= pct20, 1, 0),
                                    flow50 = ifelse(Q <= pct50, 1, 0),
                                    flow80 = ifelse(Q >= pct80, 1, 0),
                                    flow90 = ifelse(Q >= pct90, 1, 0))
  ) ) %>%
  unnest(flows) %>%
  ungroup() %>%
  select(-data, -pct20, -pct50, -pct80, -pct90) %>%
  # calculate cumulative count of each storm event
  group_by(site_no) %>%
  arrange(site_no, date) %>%
  mutate(cum_flow20 = cumsum(flow20),
         cum_flow50 = cumsum(flow50),
         cum_flow80 = cumsum(flow80),
         cum_flow90 = cumsum(flow90))

# Now do the cumulative anomaly analysis of the counts of flow events
# for each subwatershed
# p_cum_flows <- ggplot(df_Q_flows, aes(x = cum_D)) +
#   geom_line(aes(y = cum_flow20, color = "20th percentile")) +
#   geom_line(aes(y = cum_flow50, color = "50th percentile")) +
#   geom_line(aes(y = cum_flow80, color = "80th percentile")) +
#   geom_line(aes(y = cum_flow90, color = "90th percentile")) +
#   facet_wrap(~site_no, scales = "free_y") +
#   theme_classic() +
#   scale_color_manual(name = "", values = c("20th percentile" = "blue", 
#                                            "50th percentile" = "red", 
#                                            "80th percentile" = "black",
#                                            "90th percentile" = "green")) +
#   labs(y = "cumulative flow event count")
# p_cum_flows

# calculate the anomalies of the flow event counts
df_Q_flows_anom <- df_Q_flows %>%
  filter(site_no != "LakeCity") %>%
  group_by(site_no) %>%
  nest() %>%
  mutate(lm20 = map(data, ~lm(cum_flow20 ~ cum_D + 0, data = .)),
         residuals20 = map2(data, lm20, ~mutate(.x, res_flow20 = .x$cum_flow20 - predict(.y, .x))),
         lm50 = map(data, ~lm(cum_flow50 ~ cum_D + 0, data = .)),
         residuals50 = map2(data, lm50, ~transmute(.x, res_flow50 = .x$cum_flow50 - predict(.y, .x))),
         lm80 = map(data, ~lm(cum_flow80 ~ cum_D + 0, data = .)),
         residuals80 = map2(data, lm80, ~transmute(.x, res_flow80 = .x$cum_flow80 - predict(.y, .x))),
         lm90 = map(data, ~lm(cum_flow90 ~ cum_D + 0, data = .)),
         residuals90 = map2(data, lm90, ~transmute(.x, res_flow90 = .x$cum_flow90 - predict(.y, .x))) ) %>%
  unnest(c(residuals20, residuals50, residuals80, residuals90)) %>%
  ungroup() %>%
  select(-lm20, -lm50, -lm80, -lm90, -data)

# plot the anomalies over time
p_res_flow_time <- ggplot(df_Q_flows_anom, aes(x = date)) +
  geom_line(aes(y = res_flow20, color = "20th percentile")) +
  geom_line(aes(y = res_flow50, color = "50th percentile")) +
  geom_line(aes(y = res_flow80, color = "80th percentile")) +
  geom_line(aes(y = res_flow90, color = "90th percentile")) +
  facet_wrap(~site_no, scales = "free_y") +
  geom_hline(yintercept = 0) +
  theme_classic() +
  # want to use color blind friendly colors
  scale_color_manual(name = "", values = c("20th percentile" = "black",  
                                           "50th percentile" = "#0072B2",
                                           "80th percentile" = "#E69F00",
                                           "90th percentile" = "#D55E00")) +
  labs(y = "flow event count anomaly")
p_res_flow_time
Figure 31: Cumulative discharge event anomalies for each USGS gage over time for dishcarge less than the 20th and 50th, and greater than the 80th, 90th, and 95th percentile.

Instead of changes in counts, we can look at changes in volume or depth. This paints a similar picture as before, but highlights the importance of change in flow volume the Santa Fe and lower Suwannee (Figure 32). By contrast, the lower discharges in terms of volume or depth have changed very little in magnitude over time, despite increasing their frequency.

Code
# Now the same for discharge, but want to get in units of meters for easier comparison,
# need to divide by watershed areas
# first have to do the analysis for cumulative volumes instead of counts
df_Q_flows2 <- df %>%
  group_by(site_no) %>%
  nest() %>%
  mutate(pct20 = map(data, ~quantile(.x$Q, probs = 0.2, na.rm = T)),
         pct50 = map(data, ~quantile(.x$Q, probs = 0.5, na.rm = T)),
         pct80 = map(data, ~quantile(.x$Q, probs = 0.8, na.rm = T)),
         pct90 = map(data, ~quantile(.x$Q, probs = 0.9, na.rm = T)),
         flows = map(data, ~mutate(.x, flow20 = ifelse(Q <= pct20, Q, 0),
                                    flow50 = ifelse(Q <= pct50, Q, 0),
                                    flow80 = ifelse(Q >= pct80, Q, 0),
                                    flow90 = ifelse(Q >= pct90, Q, 0))
  ) ) %>%
  unnest(flows) %>%
  ungroup() %>%
  select(-data, -pct20, -pct50, -pct80, -pct90) %>%
  # calculate cumulative count of each storm event
  group_by(site_no) %>%
  arrange(site_no, date) %>%
  mutate(cum_flow20 = cumsum(flow20) * 86400 / 1e9, #km3
         cum_flow50 = cumsum(flow50) * 86400 / 1e9,
         cum_flow80 = cumsum(flow80) * 86400 / 1e9,
         cum_flow90 = cumsum(flow90) * 86400 / 1e9)

# Now do the cumulative anomaly analysis of the volumes of flow events
# for each subwatershed
# p_cum_flows2 <- ggplot(df_Q_flows2, aes(x = cum_D)) +
#   geom_line(aes(y = cum_flow20, color = "20th percentile")) +
#   geom_line(aes(y = cum_flow50, color = "50th percentile")) +
#   geom_line(aes(y = cum_flow80, color = "80th percentile")) +
#   geom_line(aes(y = cum_flow90, color = "90th percentile")) +
#   facet_wrap(~site_no, scales = "free_y") +
#   theme_classic() +
#   scale_color_manual(name = "", values = c("20th percentile" = "blue", 
#                                            "50th percentile" = "red", 
#                                            "80th percentile" = "black",
#                                            "90th percentile" = "green")) +
#   labs(y = "cumulative flow volume (km^3)")
# p_cum_flows2

# calculate the anomalies of the flow event volumes
df_Q_flows_anom2 <- df_Q_flows2 %>%
  filter(site_no != "LakeCity") %>%
  group_by(site_no) %>%
  nest() %>%
  mutate(lm20 = map(data, ~lm(cum_flow20 ~ cum_D + 0, data = .)),
         residuals20 = map2(data, lm20, ~mutate(.x, res_flow20 = .x$cum_flow20 - predict(.y, .x))),
         lm50 = map(data, ~lm(cum_flow50 ~ cum_D + 0, data = .)),
         residuals50 = map2(data, lm50, ~transmute(.x, res_flow50 = .x$cum_flow50 - predict(.y, .x))),
         lm80 = map(data, ~lm(cum_flow80 ~ cum_D + 0, data = .)),
         residuals80 = map2(data, lm80, ~transmute(.x, res_flow80 = .x$cum_flow80 - predict(.y, .x))),
         lm90 = map(data, ~lm(cum_flow90 ~ cum_D + 0, data = .)),
         residuals90 = map2(data, lm90, ~transmute(.x, res_flow90 = .x$cum_flow90 - predict(.y, .x))) ) %>%
  unnest(c(residuals20, residuals50, residuals80, residuals90)) %>%
  ungroup() %>%
  select(-lm20, -lm50, -lm80, -lm90, -data)

# plot the anomalies over time, but in units of m by dividing by watershed area
p_res_flow_time2 <- df_Q_flows_anom2 %>% 
  left_join(dataRetrieval::readNWISsite(c("02323500", "02322500", "02320500", "02321500",
                                          "02319500", "02315500", "02317500", "02319000")) %>%
              select(site_no, area_mi2 = drain_area_va)) %>%
  mutate(areasqkm = area_mi2 * 2.58999^2) %>%
  mutate(res_flow20 = res_flow20 / areasqkm * 1000, # m
         res_flow50 = res_flow50 / areasqkm * 1000,
         res_flow80 = res_flow80 / areasqkm * 1000,
         res_flow90 = res_flow90 / areasqkm * 1000) %>%
  ggplot(aes(x = date)) +
  geom_line(aes(y = res_flow20, color = "20th percentile")) +
  geom_line(aes(y = res_flow50, color = "50th percentile")) +
  geom_line(aes(y = res_flow80, color = "80th percentile")) +
  geom_line(aes(y = res_flow90, color = "90th percentile")) +
  facet_wrap(~site_no, scales = "free_y") +
  theme_classic() +
  scale_color_manual(name = "", values = c("20th percentile" = "black",  
                                           "50th percentile" = "#0072B2",
                                           "80th percentile" = "#E69F00",
                                           "90th percentile" = "#D55E00")) +
  facet_wrap(~site_no, scales = "free_y") +
  geom_hline(yintercept = 0) +
  labs(y = "flow volume anomaly (m)")
p_res_flow_time2
Figure 32: Cumulative discharge depth anomalies for each USGS gage over time for dishcarge less than the 20th and 50th, and greater than the 80th, 90th, and 95th percentile.

7 Brown-out analysis

Now we do the same analysis for brown-outs, which are defined as days when flow gain is positive, and the upstream flow is above the 80th percentile of flow. We see that brown-outs have been declining in recent years for all springsheds except the Middle Suwannee (Figure 33).

Code
# Brown out analysis ------------------------------------------------------
# if flow gain is greater than 0, then check if upstream flow is more than 
# is 80% percentile of upstream flow, if so, it is a brown out
df_brown <- df_fg %>%
  group_by(gage_pair) %>%
  mutate(cumD = row_number()) %>%
  mutate(usQ80 = quantile(usQ, probs = 0.8, na.rm = TRUE),
         brown_out = if_else(usQlag > usQ80 & flow_gain > 0, 1, 0)) %>%
  mutate(brown_out = if_else(is.na(brown_out), 0, brown_out)) %>%
  mutate(cum_brown = cumsum(brown_out)) %>%
  ungroup()

# plot the cumulative brown outs over time
p_cumBrown <- ggplot(df_brown, aes(x = cumD, y = cum_brown)) +
  geom_line() +
  theme_classic() +
  facet_wrap(~gage_pair, scales = "free_y") +
  stat_smooth(method = "lm", se = FALSE, formula = y ~ x + 0) +
  ggpubr::stat_regline_equation(formula = y ~ x + 0,
                                label.y.npc = 0.8) +
  labs(x = "date",
       y = "cumulative brown outs")

# calculate this anomaly analysis
df_brown_anom <- df_brown %>%
  group_by(gage_pair) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(cum_brown ~ cumD + 0, data = .)),
         residuals = map2(data, lm, ~mutate(.x, residual = .x$cum_brown - predict(.y, .x)))) %>%
  unnest(residuals) %>%
  ungroup() %>%
  select(-lm, -data)

# plot the residuals over time
p_resTime_brown <- ggplot(df_brown_anom, aes(x = date, y = residual, color = gage_pair)) +
  geom_line() +
  scale_color_manual(name = "Springshed",
                     values = c("lower" = "grey", "middle" = "black", "sf" = "blue", "upper" = "red"),
                     labels = c("Lower Suwannee", "Middle Suwannee", "Santa Fe River", "Upper Suwannee")) +
  theme_classic() +
  geom_hline(yintercept = 0) +
  theme(axis.title.x = element_blank(),
        legend.position = c(0.15, 0.8)) +
  labs(y = "brown out anomaly")
p_resTime_brown
Figure 33: Brown-out analysis for each USGS gage over time.

8 Slackwater analysis

And finally, the anomaly analysis for slackwater events. Slackwater events are defined as days when flow gain is positive, but the flow gain is less than the 20th percentile of flow gains. We see that slackwater events have been increasing dramatically in recent years for all springsheds (Figure 34).

Code
# Slackwater anomalies ----------------------------------------------------
# define slackwater as flow_gain is positive, but flow gain is less than the 20th percentile
# of flow gains
df_sw <- df_fg %>%
  group_by(gage_pair) %>%
  mutate(cumD = row_number()) %>%
  mutate(flow_gain20 = quantile(flow_gain, probs = 0.2, na.rm = TRUE),
         slackwater = if_else(flow_gain > 0 & flow_gain < flow_gain20, 1, 0)) %>%
  mutate(slackwater = if_else(is.na(slackwater), 0, slackwater)) %>%
  mutate(cum_sw = cumsum(slackwater)) %>%
  ungroup()

# plot the cumulative slackwater over time
p_cumSW <- ggplot(df_sw, aes(x = cumD, y = cum_sw)) +
  geom_line() +
  theme_classic() +
  facet_wrap(~gage_pair, scales = "free_y") +
  stat_smooth(method = "lm", se = FALSE, formula = y ~ x + 0) +
  ggpubr::stat_regline_equation(formula = y ~ x + 0,
                                label.y.npc = 0.8) +
  labs(x = "date",
       y = "cumulative slackwater")

# calculate this anomaly analysis
df_sw_anom <- df_sw %>%
  group_by(gage_pair) %>%
  nest() %>%
  mutate(lm = map(data, ~lm(cum_sw ~ cumD + 0, data = .)),
         residuals = map2(data, lm, ~mutate(.x, residual = .x$cum_sw - predict(.y, .x)))) %>%
  unnest(residuals) %>%
  ungroup() %>%
  select(-lm, -data)

# plot the residuals over time
p_resTime_sw <- ggplot(df_sw_anom, aes(x = date, y = residual, color = gage_pair)) +
  geom_line() +
  scale_color_manual(name = "Springshed",
                     values = c("lower" = "grey", "middle" = "black", "sf" = "blue", "upper" = "red"),
                     labels = c("Lower Suwannee", "Middle Suwannee", "Santa Fe River", "Upper Suwannee")) +
  theme_classic() +
  geom_hline(yintercept = 0) +
  theme(axis.title.x = element_blank(),
        legend.position = c(0.15, 0.8)) +
  labs(y = "slackwater anomaly")
p_resTime_sw
Figure 34: Slackwater analysis for each USGS gage over time.